Unfooling Perturbation-Based Post Hoc Explainers

نویسندگان

چکیده

Monumental advancements in artificial intelligence (AI) have lured the interest of doctors, lenders, judges, and other professionals. While these high-stakes decision-makers are optimistic about technology, those familiar with AI systems wary lack transparency its decision-making processes. Perturbation-based post hoc explainers offer a model agnostic means interpreting while only requiring query-level access. However, recent work demonstrates that can be fooled adversarially. This discovery has adverse implications for auditors, regulators, sentinels. With this mind, several natural questions arise - how we audit black box systems? And ascertain auditee is complying good faith? In work, rigorously formalize problem devise defense against adversarial attacks on perturbation-based explainers. We propose algorithms detection (CAD-Detect) (CAD-Defend) attacks, which aided by our novel conditional anomaly approach, KNN-CAD. demonstrate approach successfully detects whether system adversarially conceals process mitigates attack real-world data prevalent explainers, LIME SHAP. The code available at https://github.com/craymichael/unfooling.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Colors as explainers?

Byrne & Hilbert (B&H) argue that colors are reflectance properties of objects. They also claim that a necessary condition for something’s being a color is that it causally explain – or be causally implicated in the explanation of – our perceptions of color. I argue that these two positions

متن کامل

Post Hoc Tests

Familywise Error Familywise error (FWE) is also known as alpha inflation or cumulative Type I error. Familywise error represents the probability that any one of a set of comparisons or significance tests is a Type I error. As more tests are conducted, the likelihood that one or more are significant just due to chance (Type I error) increases. One can estimate familywise error with the following...

متن کامل

Post Hoc Tests

Familywise Error Familywise error (FWE) is also known as alpha inflation or cumulative Type I error. Familywise error represents the probability that any one of a set of comparisons or significance tests is a Type I error. As more tests are conducted, the likelihood that one or more are significant just due to chance (Type I error) increases. One can estimate familywise error with the following...

متن کامل

Post hoc subgroup analysis.

I am surprised by the published conclusions of Hung et al 1 in a recent issue of CHEST (August 2013). The authors’ stated meth ods are not adequate to support their conclusion that hyperimmune IV immunoglobulin (H-IVIG) ben efi ts mortality if given within 5 days to patients with severe 2009 infl uenza A(H1N1) infection. The primary outcome analysis of 34 patients presented in Table 1 of the st...

متن کامل

Post - Hoc Comparisons

The F test used in analysis of variance (ANOVA) is called an omnibus test because it can detect only the presence or the absence of a global effect of the independent variable on the dependent variable. However, in general we want to draw specific conclusions from the results of an experiment. Specific conclusions are derived from focused comparisons which are, mostly, implemented as contrasts ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i6.25847